Accounting for noise when clustering biological data

نویسندگان

  • Roman Sloutsky
  • Nicolas Jimenez
  • Sanjay Joshua Swamidass
  • Kristen M. Naegle
چکیده

Clustering is a powerful and commonly used technique that organizes and elucidates the structure of biological data. Clustering data from gene expression, metabolomics and proteomics experiments has proven to be useful at deriving a variety of insights, such as the shared regulation or function of biochemical components within networks. However, experimental measurements of biological processes are subject to substantial noise-stemming from both technical and biological variability-and most clustering algorithms are sensitive to this noise. In this article, we explore several methods of accounting for noise when analyzing biological data sets through clustering. Using a toy data set and two different case studies-gene expression and protein phosphorylation-we demonstrate the sensitivity of clustering algorithms to noise. Several methods of accounting for this noise can be used to establish when clustering results can be trusted. These methods span a range of assumptions about the statistical properties of the noise and can therefore be applied to virtually any biological data source.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

بررسی مشکلات الگوریتم خوشه بندی DBSCAN و مروری بر بهبودهای ارائه‌شده برای آن

Clustering is an important knowledge discovery technique in the database. Density-based clustering algorithms are one of the main methods for clustering in data mining. These algorithms have some special features including being independent from the shape of the clusters, highly understandable and ease of use. DBSCAN is a base algorithm for density-based clustering algorithms. DBSCAN is able to...

متن کامل

A Model-Based Approach to Gene Clustering with Missing Observation Reconstruction in a Markov Random Field Framework

The different measurement techniques that interrogate biological systems provide means for monitoring the behavior of virtually all cell components at different scales and from complementary angles. However, data generated in these experiments are difficult to interpret. A first difficulty arises from high-dimensionality and inherent noise of such data. Organizing them into meaningful groups is...

متن کامل

A Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm

Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...

متن کامل

Developing a Course Recommender by Combining Clustering and Fuzzy Association Rules

Each semester, students go through the process of selecting appropriate courses. It is difficult to find information about each course and ultimately make decisions. The objective of this paper is to design a course recommender model which takes student characteristics into account to recommend appropriate courses. The model uses clustering to identify students with similar interests and skills...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Briefings in bioinformatics

دوره 14 4  شماره 

صفحات  -

تاریخ انتشار 2013